<% String title="DAS Ontology extension";
   String header = title;
%>

<%@ include file="sangerheader.jsp" %>

<div id="main">

<p>
This page is part of the <a href="spec_1.53E.jsp">DAS 1.53E specification</a>
</p>



<h2>Structuring DAS Protein Feature Annotation</h2>


<P CLASS=western LANG=en-GB>
  A primary aim of the BioSapiens project is the integration of protein sequence
  annotation from regionally distributed data providers. We have chosen to
  implement a distributed annotation system (DAS). DAS annotation is provided in
  the form of Features, composed of a feature type, a feature description, and a
  sequence position.
</P>

<P CLASS=western LANG=en-GB>
  There are two important factors when integrating annotations from different
  sources: Firstly, the terms that are used need to be standardised so that
  'like' terms can be identified and compared. Secondly, evidence must be
  provided to describe how each annotation is created. In general, some features
  are annotated by a curator from experimental evidence from the literature (the
  UniProt/SwissProt annotations). Whilst these data provide an accurate source
  of information, other means of annotation by more automatic methods provide a
  much greater coverage.
</P>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  An effective way of structuring feature annotation is to develop an ontology
  of protein feature types. An ontology provides a structured and precisely
  defined common controlled vocabulary in a dynamic environment so that changes
  can occur as different uses are invented and new terms added. We are proposing
  the new Protein Feature Ontology, jointly developed by the BioSapiens,
  UniProt, and GO consortia, as well as the GO evidence ontology, for adoption
  by BioSapiens partners. In the following sections, we describe the ontologies
  we recommend for use, as well as minor technical changes to the use of the DAS
  protocol, which will allow DAS client software to provide annotation in a much
  more structured, user-friendly way. An example of the layout of the final format
  is shown in the figure below:
</P>
<img src="img/das_onto1.jpg"/>

<H2>
  The Protein Feature Ontology. 
</H2>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  The Protein Feature Ontology is available from the Ontology Lookup Service
  (OLS)
</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">

  <FONT COLOR=#336666><B><A HREF=http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS>http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS
  </A></B></FONT>
</P>
<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  The protein feature ontology is a set of terms which describe the features
  which make up protein function and form. It is divided into two parts:
  Positional terms which refer to a specific residue or range of residues in the
  protein and non-positional terms which refer to the whole protein sequence or
  structure.
</P>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  Within the ontology you will find three types of ontology term IDs:
</P>
<UL>

  <LI>
    <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
      BS, identifying BioSapiens only terms
    </P>
    <LI>
      <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
        SO, identifying terms from the Sequence Ontology
      </P>
      <LI>
        <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>

          MOD, identifying terms from the protein modification ontology PSI-MOD
        </P>
</UL>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  This is because the ontology is a composite ontology, taking terms from all
  three sources and linking them together to create the final ontology. Details
  of the structure and location of the terms are below.
</P>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>

  <FONT SIZE=4><B>Non-positional annotations</B></FONT>
</P>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  This section contains information which refers to the whole protein and its
  biological function. Fields include:
</P>
<UL>
  <LI>
    <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>

      A reference to <I>publication</I>s that exist for that protein. These are
      normally supplied from the publications listed by the Uniprot/SwissProt
      curators.
    </P>
    <LI>
      <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
        A <I>family_annotation,</I> a free text indication of the family to
        which the protein belongs.
      </P>
      <LI>

        <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
          <I>Functional_annotation</I> that is either a free text description,
          an <I>EC_annotation </I>for enzymes, or a <I>GO_annotation. </I>
        </P>
        <LI>
          <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
            <I>Potentially mispredicted protein sequences</I>, using the
            <I>erroneous_protein</I> and <I>abnormal_protein</I> categories.
          </P>

</UL>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  Positional features describe primary information, that is, actual features
  that are present and annotated on the protein sequence or structure and are
  located on a particular residue or subsequence of the peptide. These terms
  clearly fall within the scope of the Sequence Ontology, an ontology provided
  by the GO consortium which is suitable for describing biological sequences. As
  a result, these terms have been integrated into the SO for further
  development. The terms are filtered from SO and added into the Protein Feature
  Ontology automatically. More details on the classification of these features
  can be seen in the ontology.
</P>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  <B><FONT SIZE=4>Additional terms to describe post translational
  modifications</FONT></B>

</P>

</P>
<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  In addition to these terms, members of the BioSapiens NoE also provide
  annotations for post-translational modifications. For these annotations, the
  PSI-MOD terms will be used. For ease of use, these terms have been integrated
  into the Protein Feature Ontology under the
  <I>post_translational_modification</I> term.
</P>

<P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
  <FONT COLOR=#ff6600><FONT SIZE=5><B>To do:</B></FONT></FONT>
</P>

<OL>
  <LI>
    <P ALIGN=LEFT STYLE=MARGIN-TOP:0.19in>
      <B><FONT COLOR=#ff6600>For each feature type you provide, please browse
      the ontology
      (</FONT></B><FONT COLOR=#336666><B><A HREF=http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS><FONT COLOR=#ff6600>http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=BS
      </FONT></A></B></FONT><B><FONT COLOR=#ff6600>) and select the term which
      describes your feature. If your feature is not present in the ontology or
      there is some other problem please notify me immediately
      (</FONT></B><FONT COLOR=#336666><B><A HREF=mailto:gabby@ebi.ac.uk><FONT COLOR=#ff6600>gabby@ebi.ac.uk</FONT></A></B></FONT><B><FONT COLOR=#ff6600>).</FONT></B>

    </P>
    <LI>
      <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
        <FONT COLOR=#ff6600><B>The ontology term and the reference id (either
        beginning with SO: BS: or MOD:) must be added to the TYPEs command (see
        figures below for correct format). </B></FONT>
      </P>
      <LI>
        <P ALIGN=JUSTIFY CLASS=western LANG=en-GB>
          <FONT COLOR=#ff6600><B>The FEATURE tag (label identifier) specifies
          any specific information which relates to that feature in that
          particular protein. </B></FONT>

        </P>
</OL>

<H2 CLASS=western>
  <U>Evidence codes ECO </U>
</H2>
<P ALIGN=LEFT CLASS=western LANG=en-GB>

  Currently, DAS derived from manual curation or experimental evidence is
  indistinguishable from annotations which have been predicted. To allow a fine
  grained attribution of data source type, terms from the Evidence Code Ontology
  (ECO) should be used to classify each DAS feature annotation.
</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
  <FONT COLOR=#ff6600><FONT SIZE=5><B>To do:</B></FONT></FONT>
</P>
<OL>
  <LI>
    <P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
      <B><FONT COLOR=#ff6600>For each feature you provide, select an evidence
      code from these:
      </FONT></B><FONT COLOR=#336666><B><A HREF=http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=ECO><FONT COLOR=#ff6600>http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=ECO
      </FONT></A></B></FONT>

    </P>
</OL>

<H2 CLASS=western>
  <U>Avoiding data redundancy </U>
</H2>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  The introduction of standardised feature types and evidence codes will already
  allow DAS clients to provide a much more user-friendly interface. However, we
  still need to address the problem of high redundancy in the data provided by
  multiple servers.
</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">

  Example:
</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
  UniProt provides many different annotations of domains, for example, the SMART
  domain. In this case, the Server is "UniProt" and the feature type is "SMART".
  However, SMART also provides a DAS server and in this case "SMART" is the
  server and "SMART domain" is the feature type. The same domain on the same
  protein will be annotated twice. To allow the DAS clients to detect such
  redundancy, we need to provide information on the primary source of each
  feature annotation.
</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
  In the new structure, the feature type will be "domain". The source
  information will be provided in the "METHOD" tag of the DAS response.
</P>
<CENTER>
<TABLE BORDER=1 BORDERCOLOR=#c0c0c0 CELLPADDING=0 CELLSPACING=3 WIDTH=629>
  <COL WIDTH=189> <COL WIDTH=263> <COL WIDTH=163>

  <TR>
    <TD WIDTH=189>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        <B>SERVER </B>
      </P>
    </TD>
    <TD WIDTH=263>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>

        <B>Feature Type </B>
      </P>
    </TD>
    <TD WIDTH=163>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        <B>Method</B>
      </P>
    </TD>

  </TR>
  <TR>
    <TD WIDTH=189>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        UniProt
      </P>
    </TD>
    <TD WIDTH=263>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>

        Domain
      </P>
    </TD>
    <TD WIDTH=163>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        SMART
      </P>
    </TD>
  </TR>
  <TR>

    <TD WIDTH=189>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        SMART
      </P>
    </TD>
    <TD WIDTH=263>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        Domain
      </P>
    </TD>

    <TD WIDTH=163>
      <P ALIGN=CENTER CLASS=western LANG=en-GB>
        SMART
      </P>
    </TD>
  </TR>
</TABLE>
</CENTER>

<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
  <FONT COLOR=#ff6600><FONT SIZE=5><B>To do:</B></FONT></FONT>
</P>
<OL>
  <LI>
    <P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
      <FONT COLOR=#ff6600><B>If you do annotate by running someone else's method
      or transferring data from another database, does the source of the
      annotation also have a DAS server?</B></FONT>
    </P>
</OL>

<UL>
  <LI>
    <P ALIGN=LEFT>
      <FONT COLOR=#ff6600><B>If yes, annotate the METHOD tag with the nickname
      of this server as given in the DAS Registry at
      http://www.dasregistry.org/.<BR>
      Please see the figures below for actual format.</B></FONT>
    </P>
    <LI>
      <P ALIGN=LEFT>

        <FONT COLOR=#ff6600><B>If no, write the method name into this field.
        Please be careful to use the actual name of the program as it is
        published and spelt.</B></FONT>
      </P>
      <LI>
        <P ALIGN=LEFT STYLE=MARGIN-BOTTOM:0.19in>
          <FONT COLOR=#ff6600><B>If the annotation you provide in this track is
          derived from your own in-house method, please fill the METHOD tag with
          the name of your server.</B></FONT>
        </P>
</UL>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  <BR>

</P>
<H3 CLASS=western>
  <FONT SIZE=5><U>Updates/Questions/Comments </U></FONT>
</H3>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  We expect the Protein Feature Ontology to dynamically evolve over the next few
  weeks to reflect the needs of the BioSapiens consortium. Questions, comments,
  and term requests should be sent to
  <FONT COLOR=#336666><B><A HREF=mailto:gabby@ebi.ac.uk>gabby@ebi.ac.uk
  </A></B></FONT>.
</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
  All questions and comments will be added to the
  <FONT COLOR=#336666><B><A HREF=http://www.ebi.ac.uk/seqdb/jira/secure/Dashboard.jspa>BioSapiens
  Ontology JIRA Tracker system. </A></B></FONT>

</P>
<P ALIGN=LEFT STYLE="MARGIN-TOP:0.19in; MARGIN-BOTTOM:0.19in">
  To view these comments, there is no need to log in. Please check the listed
  comments before sending me an email to check that the issue has not already
  been raised.
</P>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  <IMG ALIGN=BOTTOM HEIGHT=2 SRC=ontology_documentation_5_images/ddvsgrfp_12ftmxk36c.gif WIDTH=600>
</P>
<H3 CLASS=western STYLE=page-break-before:always>
  <FONT SIZE=5><U>Implementation</U></FONT>
</H3>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  The following schema shows how to map protein feature information to the DAS
  protocol. We have structured the mapping so that it is backwards compatible
  with existing DAS servers and clients, but will allow modern clients a much
  more user-friendly display of protein annotation from the BioSapiens
  consortium.

</P>




<P ALIGN=LEFT CLASS=western LANG=en-GB>
  <FONT SIZE=4><B>Changes have been implemented by UniProtKB</B></FONT>
</P>


<P ALIGN=LEFT CLASS=western LANG=en-GB>
  UniProtKB have already implemented the changes, please see their DAS server
  for more help and information:
</P>
<P ALIGN=LEFT CLASS=western LANG=en-GB>
  <FONT COLOR=#336666><B><A HREF=http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/features?segment=P03973>http://www.ebi.ac.uk/das-srv/uniprot/das/uniprot/features?segment=P03973</A></B></FONT>
</P>

<img src="img/das_onto2.jpg"/>


 
 
<%@ include file="sangerfooter.jsp" %>
